On the uniform generation of random graphs with prescribed degree sequences 
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Random graphs with prescribed degree sequences have been widely used as a model of complex 
networks. Comparing an observed network to an ensemble of such graphs allows one to detect 
deviations from randomness in network properties. Here we briefly review two existing methods for 
the generation of random graphs with arbitrary degree sequences, which we call the "switching" and 
"matching" methods, and present a new method based on the "go with the winners" Monte Carlo 
method. The matching method may suffer from nonuniform sampling, while the switching method 
has no general theoretical bound on its mixing time. The "go with the winners" method has neither 
of these drawbacks, but is slow. It can however be used to evaluate the reliability of the other two 
methods and, by doing this, we demonstrate that the deviations of the switching and matching 
algorithms under realistic conditions are small compared to the "go with the winners" algorithm. 
Because of its combination of speed and accuracy we recommend the use of the switching method 
for most calculations. 



I. INTRODUCTION 

In the rapidly growing literature on the modeling of 
complex networks one of the most important classes of 
network models is the random graph pj. One well- 
studied such model is the model consisting of the ensem- 
ble of all graphs that have a given degree sequence 0, 
H> QL IE @ > an d this model has proved useful in under- 
standing a variety of network properties. Realistic appli- 
cations often require that we restrict ourselves to graphs 
with no multiple edges between any vertex pair and no 
self-edges. Unfortunately, both the analytic and numer- 
ical study of such networks is known to present chal- 
lenges HHEHEimOEH- In this short paper 
we consider computer algorithms for generating graphs 
uniformly from this ensemble. We are concerned primar- 
ily with directed graphs, since the examples we will con- 
sider are directed, but the concepts discussed generalize 
in a straightforward fashion to the undirected case also. 

There are two algorithms in common use for the gener- 
ation of random graphs with single edges. We will refer to 
them as the switching algorithm B, jjj, llEl IT5 UtI ITiL IH| 
and the matching algorithm 0, lot Il9f. We argue that, 
under certain circumstances, both of these algorithms 
can generate a nonuniform sample of possible graphs. 
We then present a new algorithm based on the Monte 
Carlo procedure known as go with the winners p3 . l23| , 
which generates uniformly sampled graphs. We compare 
the three methods in the context of a particular network 
problem — estimation of the density of commonly occur- 
ring subgraphs or motifs — and show that, in this context, 
the difference between them is small. This result is of 
some practical importance, since the "go with the win- 
ners" algorithm, although statistically correct, is slow, 
while the other two algorithms are substantially faster. 



II. ALGORITHMS 

In this section we describe the three algorithms under 
consideration. 
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FIG. 1: Starting with the transcription network of E. coli, 
the network is randomized using the switching algorithm de- 
scribed in the text. We plot the number of feed-forward loops 
in the randomized networks vs. number of switches performed 
per edge in the graph. The dashed line is the expected asym- 
totic value obtained using the "go with the winners" algo- 
rithm. Each point is an average over one hundred repeti- 
tions of the calculation. Error bars are ±3 standard devia- 
tions. The randomized network reaches the equilibrium value 
around one switch per edge on average. Similar results are 
obtained for other networks and other motifs. 
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7.67(9) 


3.05(3) 
3.05(6) 
2.98(6) 


10.6(1) 
10.5(2) 
10.8(2) 


11.06(6) 
11.0(1) 
11.1(1) 


3.60(4) 
3.71(7) 
3.67(7) 


14.1(2) 
13.7(3) 
13.8(3) 


88(1) 
88.3(3) 
94.5(3) 


10.7(7) 
10.1(2) 
10.0(2) 


3.4(3) 
3.6(1) 
3.0(1) 


2.20(5) 
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1.48(3) 
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1.45(3) 


284(6) 
286(6) 
290(6) 



TABLE I: Mean and standard deviation (s.d.) of the number of appearances of the feed-forward loop subgraph in random 
networks with degree sequences the same as the real world networks studied in [Tsl |. We used between 1000 and 10 000 random 
networks for each measurement. Z-scores are the number of standard deviations by which the real network deviates from the 
average of the random ensemble. Standard errors are shown in parentheses. 



A. Switching algorithm 

First, we describe the switching algorithm, which uses 
a Markov chain to generate a rand om g raph with a given 
degree sequence 

|I U 13 13 13 Elf • For simplic- 
ity, we discuss directed networks with no mutual edges 
(vertex pairs with edges running in both directions be- 
tween them). The case with mutual edges is a simple 
generalization 

The method starts from a given network and involves 
carrying out a series of Monte Carlo switching steps 
whereby a pair of edges (A — > B, C — * D) is selected at 
random and the ends are exchanged to give {A — > D, C — » 
B). However, the exchange is only performed if it gen- 
erates no multiple edges or self-edges; otherwise it is not 
performed. The entire process is repeated some number 
QE times, where E is the number of edges in the graph 
and Q is chosen large enough that the Markov chain 
shows good mixing. (Exchanges that are not performed 
because they would generate multiple or self-edges are 
still counted to insure detailed balance pfij-) 

This algorithm works well but, as with many Markov 
chain methods, suffers because in general we have no 
measure of how long we need to wait for it to mix prop- 
erly. Theoretical bounds on the mixing time exist only 
for specific near-regular degree sequences [ljj- We em- 
pirically find, however, that for many networks, values of 
around Q = 100 appear to be more than adequate (see 

Fig.nj. 



B. Matching algorithm 

An alternative approach is the matching algorithm |J, 
0, 0| , in which each vertex is assigned a set of "stubs" 
or "spokes" — the sawn-off ends of incoming and out- 
going edges — according to the desired degree sequence. 
(One can also assign mutual-edge stubs for networks that 
include such edges.) Then in-stubs and out-stubs are 
picked randomly in pairs and joined up to create the net- 
work edges. If a multiple or self-edge is created, the en- 
tire network is discarded and the process starts over from 
scratch. 

This process will correctly generate random directed 
graphs with the desired properties. Unfortunately, how- 
ever, many real-world networks have a heavy-tailed de- 
gree distribution that includes a small minority of ver- 



tices with high degree. All other things being equal, the 
expected number of edges between two such vertices will 
often exceed one, making it unlikely that the procedure 
above will run to completion, except in the rarest of cases. 
To obviate this problem a modification of the method can 
be used in which, following selection of a stub pair that 
creates a multiple edge, the network is not discarded, and 
an alternative stub pair is selected at random. In general 
this method generates a biased sample of possible net- 
works pH but, as we will show, not significantly so for 
our purposes (see Table Q. 



C. Go with the Winners algorithm 

The "go with the winners" algorithm is a non-Markov- 
chain Monte Carlo method for sampling uniformly from 
a given distribution [22, |23| ■ When applied to the prob- 
lem of graph generation, the method is as follows. We 
consider a colony of M graphs. As with the matching 
algorithm, we start with the appropriate number of in- 
stubs and out-stubs for each vertex and repeatedly choose 
at random one in-stub and one out-stub from the graph 
and link them together to create an edge. If a multiple 
edge or self-edge is generated, the network containing 
it is removed from the colony and discarded. To com- 
pensate for the resulting slow decline in the size of the 
colony, its size is periodically doubled by cloning each of 
the surviving graphs; this cloning step is carried out at a 
predetermined rate chosen to keep the size of the colony 
roughly constant on average. The process is repeated un- 
til all stubs have been linked, then one network is chosen 
at random from the colony and assigned a weight: 

W i = 2~^ (1) 

where c is the number of cloning steps made and m is the 
number of surviving networks. The mean of any quantity 
X (for example, the number of occurrences of a given 
subgraph) over a set of such networks is then given by 

where Xi is the value of X in network i. 
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FIG. 2: Uniformity tests of the three algorithms on a toy net- 
work. Panels (a) and (b) depict the two types of topologies of 
the 91 random networks studied, one of them like (a) and 90 
like (b). Panel (c) shows the frequency with which each con- 
figuration is sampled by our three algorithms. 100 000 graphs 
were generated with each algorithm, and the figure shows the 
fraction of graphs of each type generated. If sampling were 
uniform, each should appear with probability -y, which is 
indicated by the dotted lines. The "go with the winners" 
and switching algorithms sample uniformly within sampling 
error, passing both the Kolmogorov-Smirnoff and Lillie Gaus- 
sian tests. The matching algorithm under-samples the unique 
configuration (a). 



III. COMPARISON OF ALGORITHMS 

In Fig. |21 we show a comparison of the performance 
of our three algorithms when applied to a simple toy 
network. The network consists of an out-hub with ten 
outgoing edges, an in- hub with ten incoming edges, and 
ten nodes with one incoming edge and one outgoing edge 
each. Given this degree sequence, there are just two dis- 
tinct network topologies with no multiple edges, as shown 
in Fig.|2K and[2). There is only a single way to form the 
network inEK; but there are 90 different ways to form^. 

We generated 100 000 random networks using each of 
the 3 methods described here and the results are summa- 
rized in Fig. As the figure shows, the matching algo- 
rithm introduces a bias, undersampling the configuration 
of Fig. This is a result of the dynamics of the algo- 
rithm, which favors the creation of edges between hubs. 
The switching and "go with the winners" algorithms on 
the other hand sample the configurations uniformly, gen- 
erating each graph an equal number of times within the 
measurement error on our calculations. The "go with the 
winners" algorithm truly samples the ensemble uniformly 
but is far less efficient than the two other methods. The 
results given here indicate that the switching algorithm 
produces essentially identical results while being a good 
deal faster. The matching algorithm is faster still but 
samples in a measurably biased way. 

Now consider the study of network motifs. We are in- 
terested in knowing when particular subgraphs or motifs 
appear significantly more or less often in a real- world net- 
work than would be expected on the basis of chance, and 
we can answer this question by comparing motif counts 
to random graphs. Some results for the case of the "feed- 
forward loop" motif [l8l Il9l | are given in Table [I] In this 
case the densities of motifs in the real-world networks 
are many standard deviations away from random, which 
suggests that any of the present algorithms is adequate 
for generating suitable random graphs to act as a null 
model, although the "go with the winners" and switch- 
ing algorithms, while slower, are clearly more satisfactory 
theoretically. The matching algorithm was measurably 
nonuniform for our toy example above, but seems to give 
better results on the real-world problem. 

Overall, our results appear to argue in favor of using 
the switching method, with the "go with the winners" 
method finding limited use as a check on the accuracy 
of sampling. Accuracy checks are also supplied by ana- 
lytical estimates for subgraph numbers [l^- Numerical 
results in [T3 . figt |20| were done using the switching al- 
gorithm. 



IV. CONCLUSIONS 

In this paper we have compared three algorithms for 
generating random graphs with prescribed degree se- 
quences and no multiple edges or self-edges. Two of the 
three have been used previously, but suffer from nonuni- 
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formity in their sampling properties, while the third, a 
method based on the "go with the winners" Monte Carlo 
procedure, is new and samples uniformly but is quite 
slow. Of the two older algorithms, we show that one, 
which we call the "matching" algorithm, has measurable 
deviations from uniformity when compared to the "go 
with the winners" method, although for graphs typical 
of practical studies these deviations are small enough to 
make no significant difference to most previously pub- 
lished results. The other older algorithm, which we 
call the "switching" algorithm and which is based on a 
Markov chain Monte Carlo method, samples correctly in 



the limit of long times and in practice is found to give 
good results when compared with the "go with the win- 
ners" method. Overall, therefore, we conclude that the 
switching algorithm is probably the algorithm of choice, 
with the "go with the winners" algorithm finding a sup- 
porting role as a check on uniformity, although its slow- 
ness makes it impractical for large-scale use. 
We thank Oliver D. King for discussions and for pointing 
out and demonstrating that the matching algorithm of 
the supplementary online material of [l9l | does not uni- 
formly generate simple graphs. 
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